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Abstract 

We reconsider a previously published (Dalkilic et al.) algorithm for merging lists by 
way of the perfect shuffle permutation. The original publication gave only experimental 
results which, although consistent with linear execution time on the samples tested, 
provided no analysis. Here we prove that the time complexity, in the average case, 
is indeed linear, although there is a 0(n 2 ) worst case. This is then the first provably 
linear time merge algorithm based on the use of the perfect shuffle. We provide a proof 
of correctness, extend the algorithm to the general case where the lists are of unequal 
length and show how it can be made stable, all aspects not included in the original 
presentation and we give a much more concise definition of the algorithm. 
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1 Introduction 


In the context of data processing, to merge lists is to create one list, sorted on some key, 
from the elements of two sorted lists. The standard merge algorithm is simple and uses only 
time linear in the input size but it duplicates the input space. Merge algorithms that use no 
extra space other than the program variables and possibly that space used for recursion are 
sometimes called in-place. Another significant property, besides time and space complexity, 
is “stability”, which means that the order of equal elements is guaranteed to be preserved. A 
linear time, in-place, stable merge algorithm can be used to realise an in-place, stable sorting 
algorithm with optimal time complexity O(nlog n) by repeated merging. If one is using the 
merge to sort on more than one key, stability is essential. 

The problem of constructing an algorithm with the three desirable properties: linear time, 
using 0(log 2 n) space and stability, has a long history, having been first posed in [Knu73l 
Chapter 5, Section 5, Exercise 3]. Since that time several solutions have been proposed. 
Partial surveys of this work are given in [ EMOOl 1DAT11] , 

Most solutions use block rotation techniques where segments of the arrays used to repre¬ 
sent the lists are cyclically shifted. A new approach was introduced in [ EMOO] which begins 
the merge by performing a “perfect shuffle” on equal length sub-lists of the input, before any 
comparisons are made. A perfect shuffle intersperses the elements of equal length sub-lists 
so that the order of elements in each list is preserved and every other element in the result 
comes from the same input list. A perfectly shuffled list is of even length. A 2-ordered list 
has the same properties except that it is not necessarily of even length. 

The list resulting from the shuffle is not necessarily sorted but one might expect that, on 
the average, elements will already be close to their correct position, after the single shuffle 
operation. It remains only to tidy up the shuffled list. 

In [ EMOO] . the tidying is accomplished by breaking the shuffled list into a sequence of 
d-strings , where a d-string is a maximal length sub-list whose first element is greater than 
the last. The d-strings are then unshuffled and adjacent unshuffled segments are recursively 
merged. The time complexity analysis given showed only time 0{n log log n) and that not on 
the given algorithm but only on an elaboration thereof. The reported results of experiments 
were not consistent with linear time. 

In [PATH] another way to do the tidying was described. This process, described again 
below, moves from left to right (or vice versa ) taking items from what remains of the shuffled 
list and adding them, via a rotation, to an already sorted list. This method is an improvement 
because it is relatively simple to implement and does not need recursion, which can bring the 
space complexity down to the absolute minimum of O(logn), i.e., only a constant number 
of variables is used. Most importantly, experiments gave time complexity results consistent 
with linear time. However the authors of [DAT 11] did not give an analysis of the time 
complexity, only the experimental results. 

It is the purpose this paper to prove that the time complexity of the new algorithm 
[PATH ] is, in the average case, indeed linear, although there is a worst case requiring @(n 2 ) 
time. We also present the algorithm in a more concise form which we think helps to clarify its 
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structure and the analysis. We also provide the general form of the algorithm that covers lists 
of unequal length, which was missing from the original presentation, and we update a review 
of shuffling methods, which are essential to the simplicity and practicality of the complete 
process. Finally, we remind readers that stability, if required, falls out very naturally from 
the shuffling process and show how it can be realised in this new algorithm. 

Section 2 defines the algorithm and gives a proof of correctness. The proof of average 
case, linear time complexity is given in Section 3. In Section 4 we list some of the more 
promising Shuffling methods so far proposed, one of which was not mentioned in jDATllj . 
Section 5 shows how to achieve stability, should it be required. 

2 The Algorithm 

Suppose the two sorted lists to be merged are represented by contiguous segments of an 
array A. We also assume that there are no equal elements. The case where there may exist 
equal elements and stability is required is covered in Section 5. We first define the core of 
the process, in Section 2.1, by assuming that the lists to be merged are of equal length, say 
N/2. We generalise to lists of unequal length in Section 2.3. 

2.1 The case of equal length lists 

Suppose the lists are of equal length. The process maintains three lists: a sorted list, S, 
an intermediate list, P and a 2-ordered list, Sh. Figure Q] illustrates these structures. The 
array indices i and j are used to delineate the extents of the three lists. Index i defines the 
beginning of P and j the beginning of Sh. The list S comprises A[l] - A[i — 1], P is A[i] - 
A[j — 1] and Sh is A[j] - A[N], 

The algorithm Right-going-merge, see Algorithm 1, uses four procedures. The procedure 
Scan returns an integer r such that A[j] - A[j + 2 r — 1] is a maximal, even length prefix of 
Sh, denoted D, such that all odd indexed elements are less than A[i], the first element of P. 
Scan is only invoked if |*S'/z| > 2. The procedure Shuffle performs an in-shuffle on the input 
lists, assumed to be of equal length, i.e., only the interior elements are moved, the first and 
last elements are left unmoved. 

The procedure Unshuffle performs the inverse of Shuffle, i.e., an un-in-shuffle, on D to 
produce the two lists O and E. Shuffling methods are discussed in Section 4. The procedure 
Rotate circularly shifts the two adjacent segments of A that represent P and O to the right 
by r. See Figure [D We call this procedure right-going-merge because the scan proceeds from 
left to right. As described in Section 2.3, to handle the case where the lists are not of equal 
length, we also use the mirror image of this procedure, called the left-going-merge, which 
scans from right to left. 
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After Scanning 


After unshuffling 


After rotating 

Adjust lists 


Create Sh by applying Shuffle to the two, equal length lists. 

{Recall that A[i\ is P[l] and A[j\ is S7i[l]} 

% := index of first element in Sh; j i + 1; 

while not Sh is empty do 

if P[l] < Sh[l] then {adjust lists} i := i+1 

if |P| = 0 then j := j + 1; complement (type) fi 
else if |S7i| = 1 then r := 1; Rotate; {adjust lists} i := i+1; j:=j+l 
else {Figure 1} Scan; Unshuffle; Rotate; 

{Adjust lists} i i + r + 1; j j + 2r fi fi 
endwhile; 


Algorithm 1: Right-going-merge 
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2.2 Correctness on equal length lists 

We call the input list occupying the lower indexed part of the array the left list and the 
other the right list. We define the type of an element to be either left or right depending on 
which of the input lists it was a member. 

The process maintains the following properties of the three lists which constitute a loop 
invariant assertion: 

AO: P is not empty. 

Al: the elements in S are in order and there is no element in S greater than any element 
in P or in Sh. 

A2: the elements in P are all of the same type and are ordered. 

A3: Sh is 2-ordered and such that all odd elements are of one type and all the even elements 
are of the other type. 

A4: all elements in P are less than any element of the same type in Sh. 

A5: the first elements in P and in Sh are of different types. 

We use the convention that S[k], P[k\ and Sh[k] refer to the k th element in the respective 
list, not the k th element in A. The initial Shuffle and definitions of i and j, before the loop, 
leave S empty, P with what was the first element of Sh and Sh 2-ordered. So AO - A5 hold 
at first entry to the loop. When the loop is exited Sh is empty, since the loop condition is 
then false. Then, by Al and A2, A[l] through A [AT] is ordered, ft remains to show that if 
AO through A5 hold at the top of the loop they still hold at the bottom. 

The body of the loop includes three sets of actions, chosen by the if statements. At entry 
to the loop we know that Sh is not empty, by the loop condition, and P is not empty, by 
AO. At the first if, if P[l] < Sh[l] then, by Al - A5, P[l] is the next smallest element and 
should be added to S, which is accomplished by incrementing i. That shortens P. If P is 
now empty then j := j + 1 restores AO, A2, A4 and A5 because of A3, otherwise AO - A5 
are unaffected. 

At the else if statement we know that P[l] > <S7i[l]. Hence, if there is only one element 
remaining in Sh then, by Al, A2 and A5, that element should be inserted between the end 
of S and the beginning of P, at which point S is ordered. That is accomplished by the right 
rotation by one. The length of P is unchanged but shifted one to the right, which preserves 
AO. Setting i = i + 1 and j — j + 1 shifts P empties Sh which terminates the loop. 

At entry to the last else section we have that \Sh\ > 2 so that the scan procedure can 
function and we have the general situation illustrated in Figure [0 The Unshuffle, which 
is an Un-in-shuffle, creates E, all the even indexed elements of D, and O, the odd indexed 
elements. By A5, all the elements of E are of the same type as P and, by A4, greater than 
any element of P. All the elements in O are, by Scan, less than P[1] and, by Al, all greater 
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than any element of S. Hence the rotation and redefinition of the lists preserves Hi - H5. P 
loses only one element, P[l], and gains at least one from E, which preserves HO. 

At each traversal of the loop i is incremented by at least one, increasing the length of 
S. But P is never empty. Hence at some point Sh must become empty and the process 
terminates. 

2.3 The General Case 

Algorithm 1 just described, as in [ PATH] , will only handle lists of equal length. As first 
described in [EMOO] , to handle the general case of unequal lengths we use a mirror image of 
the right-going-merge called the left-going-merge. The left-going-merge scans from right to 
left, with appropriately modified element comparisons, and the rotate shifts to the left by r. 

If the input lists are of unequal lengths, we remove a prefix or a suffix, T, from the 
longer list to make the lengths equal. Then we invoke either the left or right going merges 
depending on whether T was a prefix or a suffix. At end of the merge, P may not be empty, 
in which case it may need to be merged into T. But that is only necessary if P and T are 
of opposite types. The process is illustrated in Figure [2] and described in Algorithm 2. 

The variable type has just two possible values, left and right. The value of type indicates 
the type of the elements in P, i.e., whether they were originally from the left or right list. 
The value of type is initialised by the outer layer of the algorithm. See Algorithm 2. It 
can change during the merge procedures but the only place where it changes is where |P| is 
decreased to zero. At that point, see Algorithm 1 , type is complemented. 

2.4 Correctness of the general case 

The correctness of the general case follows from the correctness of the case where the lists 
are of equal length and the following observations. The loop in Algorithm 2 sets the value of 
type correctly, depending on which merge is going to be invoked. Because the loop condition 
ensures that P is not empty at the termination of either merge procedure, P is not empty 
at the return from those procedures in the loop. Hence the value of type indicates whether 
the elements in P at this point were originally from the left or right input lists. It follows 
that, if P and T are of the same type then by HO, H2 and the fact that, by construction, all 
elements of T are greater than any element in P, then the entire array is ordered. Otherwise 
nothing is known about the relative values of the elements in P and T and so the process is 
repeated with P and T as the input lists. 
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At entry: 


After removal of sufix: 


After merge ofL and R: 


Figure 2: The case of unequal length lists 





{Let L and R denote the input left and right lists, respectively} 
complete := false; 

while not complete do 
if \R\ > \L\ 

then {right list is longer than left list} 

remove a suffix, say T, from R so that \L\ = |i?|; 

type := left; apply the right-going-merge to L and R ; 

if type = left then redefine L as P and R as T else complete := true fi 

else if {left list is longer than right list} 
then remove a prefix, say T, from L so that \L\ = |i?|; 
type := right; apply the left-going-merge to L and i?; 
if type = right then redefine L as T and R as P else complete := true fi 

else {the lists are of equal length} 

type := left; apply the right-going-merge to L and i?; complete := true fi 

fi 

endwhile 


Algorithm 2: The outer layer of the algorithm 
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3 Time Complexity 

First consider the procedures left-going-merge and right-going-merge. 

3.1 A worst case 

There is an Q(n 2 ) worst case. Suppose that, during the first pass through the inner loop, 
scan finds D to be of length N/2. Then the unshuffle leaves |P| to be of length iV/4. Then 
suppose that iV/4 of the odd indexed elements in Sh of the opposite type to P need to be 
interspersed evenly among the elements of P. For example, suppose P is a sequence of even 
integers and the odd elements in Sh are the missing odd integers. This requires rotating P, 
of average length N/ 8, iV/4 times, i.e. , ff(n 2 ) time. 

3.2 The average case 

The standard definition of average or expected time on inputs of a given size is the total of 
all execution times over all instances of that size divided by the number of instances of that 
size, i.e., we consider all instances to be equally likely. We show that, despite the existence of 
a worst case, the average case time complexity is linear. In Lemmas 3.3, 3.4 and 3.5, each of 
the procedures, Scan, Unshuffle and Rotate, is shown to take constant time on the average. 
This is because, although the execution time is proportional to the length of the input, the 
probability that the input is of a certain length decreases exponentially with that length. 

During the Scan procedure the first element in P, P[l], is compared to odd elements in 
Sh of opposite type until an element greater than that P[l] is found, or until Sh is exhausted. 
We denote the prefix of Sh thus discovered by D and its length by 2r, since its length is 
even by definition. 

Lemma 3.1 If r > 0 then Pr(\D\ = 2r) < l/2 r . 

Proof 

Let the number of elements in P, which are all from one list, say L , plus the number of L 
elements in Sh, be n and the number of R elements in Sh be m. The number of possible 
merged arrangements of these n + m elements is (n + m)\ / {n\m\). 

Suppose Scan defines D such that \D\ = 2r > 0. We note that m > r and n > m. 
After the Unshuffle, Rotate and redefinition of the list, the number of L elements in P and 
Sh is n — 1 and the number of R elements in Sh is m — r. See Figure [0 The number of 
arrangements consistent with this fact is (n + m — r — 1 )\/((n — l)!(m — r)!). Hence the 
probability that \D\ = 2r is given by: 
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Pr(\D\ = 2 r) 


(n + m — r — 1 )\n\m\ 

(m — r)\(n — l)!(n + m)\ 

n.m{m — 1) • • • (m — r + 1) 

(n + m)(n + m — 1) • • • (n + m — r) 

n m m — 1 m — r +1 

-x- x-x- 

n + m — r n + m n + m — 1 n + m — r + 1 

< l/2 r 

because m> r implies n/(n + m — r) < 1 and, for all 0 < k, (m — k)/(n + m — k) < 1/2. 

□ 


Let Pb and Sb be the P and S lists, respectively, at the bottom of the while loop, after 
all rearrangements and adjustments to the lists. See Figure [D 

Lemma 3.2 Pr(\Pb\ — p) < 1/2 P ^ 1 . 


Proof 

Suppose there are n elements of type left and m of type right distributed across Sb and Pb- 
Then m is within one of n because the numbers of each type remaining in Sh are within 
one of each other and the input lists were of equal length. The number of possible merged 
arrangements of these m + n elements is (m + n)\/(m\n\). 

Without loss of generality, suppose the elements in Pb are of type left. Then the number of 
arrangements consistent with the existence of \Pb\ = p elements all greater than any element 
in Sb is the number of ways that Sb can result from the merge of n with m—p elements, i.e., 
(m + n — p)\ / ((m — p)\n\) . Hence the probability that |P&| = p is given by: 


Pr{\Pb\ =p) = 


< 


(m + n — p)\ m\n\ 

1 __ >< _ 

(m — p)\n\ (■m + n)\ 

m(m — l)(m — 2) • • • (m — p + 1) 

(m + n) (m + n — 1) (m + n — 2) • • • (m + n — p + 1) 

m 771 — 1 77i — 2 m — p +1 

7 -- X - X-X - 

[77i + n) m + n — 1 77i + n — 2 m + n — p + 1 

1/2 P_1 


because m/(m + n) < 1 and, for all 0 < k < n, (m — k)/(m + n — k) < 1/2. 


□ 


Lemmas 13.11 and 13.21 allow us to show that the expected time complexity of all the loop 
procedures is a constant. 


Lemma 3.3 The expected time used by the Scan procedure is constant. 


9 



















Proof 

The time taken to scan 2r elements is Aqr, for some constant k\. Hence the expected time, 
expt-scan, is given by: 

OO 

expt-scan = k\rPr{\D\ = 2r) < k\r/2 r = 2Aq, 

r r=l 

by Lemma 13.11 □ 


Lemma 3.4 The expected time used by the unShuffle procedure is constant. 

Proof 

The time required to Unshuffle a list D, where \D\ = 2r, is fc 2 r, for some constant fc 2 • See 
Section 4 below. Hence the expected time, expt-shuff\ is given by: 


expt-shuff = fc 2 rPr(|-P| = 2r) < y^k 2 r/2 r = 2fc 2 , 


r=l 


by Lemma 13.11 and the fact that the unshuffle works on the result of the 


scan. 


□ 


Lemma 3.5 The expected time used by the Rotate procedure is constant. 

Proof 

Pb+1 elements are rotated, where Pb is list P at the bottom of the loop, i.e., after rotation. 
The time taken to rotate \Pb\ + 1 elements is < (|P{,| + l)k%, for some constant k 3 . Hence the 
expected time, expt-rot, is given by: 

00 00 OO 

expt-rot = y^(p + l)fc 3 Pr(|Pfr| = p) < + l)fc 3 /2 p = y^pfc 3 /2 p + ^ k 3 /2 p = 3 k 3 , 

p p= 1 p= 1 p= 1 

by Lemma 13.21 □ 


Theorem 3.1 The average time complexity of the algorithm is 0{n), where n is the com¬ 
bined length of the lists. 

Proof 

Consider the merge procedures. The actions inside the loop are either constant time oper¬ 
ations or scans, rotations or unshuffles which, by Lemmas 13.31 13.41 13.51 and the “linearity 
of expectations” [CorOll Appendix C], are expected constant time operations. Hence the 
expected time to traverse the while loop is a constant. 

Now consider the general case where the problem is broken down to the merge of a 
sequence of equal length lists, say n 1 , n 2 ... nk- We observe that n\ > n 2 ... > nk and that 
Y2i =1 n i = n - Since each merge takes time 0(n*), the total time is 0{n). □ 
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4 Shuffling 

An algorithm that performs the perfect shuffle, in-place, is not obvious. In the original paper 
[EMOO] . amongst other suggestions, it was pointed out that a simple cycle leader algorithm 
that used one bit of extra memory per element would provide a simple solution. The extra 
bits are used to mark elements already moved and permit finding an unused element, i.e., a 
new leader, from which to start a new cycle when the current cycle ends. Although one bit 
per element violates the 0(logn 2 ) space restriction, it might be acceptable in practice. 

We cite two papers that suggest practical solutions that function within the desirable 
constraints, i.e., 0(log 2 n) space and linear time. In [Jai08| a cycle leader algorithm is given 
in which some simple number theory is used to generate the “leaders”. It uses linear time 
and absolutely minimum space i.e., O(logn). 

In [ YEMR13] it is shown that the shuffle can be attained by a sequence of element 
exchanges where no element participates in more than two exchanges. A couple of ways of 
computing the identity of the elements to be exchanged are provided. One of these methods 
uses linear time and 0(log 2 n) space. 


5 Stability 

A useful feature of using the perfect shuffle is that stability falls out without much extra 
effort. We note that neither the shuffling, unshuffling nor the rotations, which are the 
only processes that move elements, can change the order of any pair of elements that are 
originally from the same list. Comparisons between elements from different lists are done 
in the scan procedure and at the first if in the loop. So it only remains to be sure that 
those comparisons give the correct priority when comparing equal elements. We note that 
comparisons are always between the first element in P and an odd indexed element in Sh 
which are, by A3 and A5, of opposite types. So assigning correct priority is easily done by 
using the current value of type which tells us the current type of the elements in P. 

6 Conclusions 

We have given an analysis of the time complexity of the improved merge via shuffling al¬ 
gorithm presented originally in [ DATl lJ where only experimental timing results on special 
cases were presented. We have shown that, although there is an @(n 2 ) worst case, the aver¬ 
age case is linear time. We provide a more concise definition of the algorithm which allows 
a proof of correctness. We have expanded the description of the algorithm to include the 
general case, where it was originally restricted to the case of equal length lists and we have 
included a mechanism that ensures stability, not given in the original. We have cited one 
more recent shuffling method, not mentioned in the original paper. 

We suggest that, by using the recently described Shuffling techniques cited in Section 5, 
this algorithm may provide a practical alternative to data processes willing to trade a slower 
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execution time for a halving of internal memory usage. 


References 


[CorOl] Thomas H. Cormen. Introduction to Algorithms. Cambridge, 2001. 

[DAT 11] Mehmet Emin Dalkilic, Elif Acar, and Gorkem Tokatli. A simple shuffle-based 
stable in-place merge algorithm. Procedia CS, 3:1049-1054, 2011. 

[EM00] Ellis and Markov. In situ, stable merging by way of the perfect shuffle. The 
Computer Journal , 43, 2000. 


[Jai08] Peiyush Jain. A simple in-place algorithm for in-shuffle, CoRR 0805.1598, 2008. 

[Knu73] Donald Knuth. The Art of Computer Programming, Vol. 3, Sorting and Search¬ 
ing. Addison-Wesley, 1973. 


[YEMR13] Qingxuan Yang, John A. Ellis, Khalegh Mamakani, and Frank Ruskey. In-place 
permuting and perfect shuffling using involutions. Inf. Process. Lett, 113(10- 
11):386—391, 2013. 


12 



